- Title
- Efficient stereo semantic segmentation for low powered computing devices
- Creator
- Biddulph, Alexander Stewart
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2024
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- This thesis develops and evaluates techniques that improve the computer vision capabilities of low-powered computing devices. The focus is primarily on stereo vision where the developed techniques prove to be superior in terms of inference speed and comparable in terms of accuracy when compared with similar state-of-the-art works. Recently, a new approach to deep learning for computer vision, named the Visual Mesh, was developed. The Visual Mesh provides a formulation that allows object detection in high resolution images to be conducted at frame rates in excess of 100 fps, what might be termed "hyper real-time", on low-powered computing platforms. While the Visual Mesh has only been demonstrated to be effective for single class object detection, it holds promise for further development in other areas of computer vision, such as multi-class semantic segmentation and stereo semantic segmentation. Fast and accurate semantic segmentation is an important function in all applications of humanoid robots. One way to increase the potential accuracy of a vision system is to increase the resolution of the camera image, as increased resolution can increase the number of pixels representing an object, making the object easier to distinguish, especially when the object is far away. However, increased resolution typically means an increased memory footprint as well as increased computation time and, as such, decreased resolution is usually favoured to decrease both memory usage and computation time. The Visual Mesh, however, suffers only a minor increase in computation time as image resolution is increased, allowing it to process high resolution images at incredibly fast frame rates. This research aims to show that both monocular and stereo multi-class semantic segmentation is achievable with both high resolution images and at high frame rates on low-powered computing devices with the Visual Mesh. The multi-class Visual Mesh is expected to have similar frame rates to the original, single class, Visual Mesh. The stereo Visual Mesh is expected to achieve roughly 50% of the frame rate of the original Visual Mesh. The author is not aware of any deep learning based approaches to semantic segmentation, either monocular or stereo, capable of running at these speeds and image resolutions on a low-powered computing device. The closest known contenders operate at under 10 fps with less than half the image resolution. The monocular and stereo multi-class Visual Meshes are evaluated against state-of-the-art semantic segmentation networks and are shown to have much faster inference speeds with minor losses to classification performance. The developed techniques and the evaluation presented in this thesis are important for the field of computer vision. They present solutions that improve the computer vision capabilities of low-powered computing devices as well as less constrained systems.
- Subject
- low-powered computing devices; stereo vision; computer vision; Visual Mesh
- Identifier
- http://hdl.handle.net/1959.13/1507571
- Identifier
- uon:56035
- Rights
- Copyright 2024 Alexander Stewart Biddulph
- Language
- eng
- Full Text
- Hits: 1286
- Visitors: 1325
- Downloads: 91
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 16 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 862 KB | Adobe Acrobat PDF | View Details Download |